Hide and seek on complex networks 
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Signaling pathways and networks determine tiie ability to communicate in systems ranging from living cells 
to human society. We investigate how the network structure constrains communication in social-, man-made and 
biological networks. We find that human networks of governance and collaboration are predictable on teat-a-teat 
level, reflecting well defined pathways, but globally inefficient. In contrast, the Internet tends to have better over- 
all communication abilities, more alternative pathways, and is therefore more robust. Between these extremes 
the molecular network of Saccharomyces cereviseae is more similar to the simpler social systems, whereas the 
pattern of interactions in the more complex Drosophilia melanogaster, resembles the robust Internet. 
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Information exchange between distant parts of a complex 
system is essential for its global functionality. For exam- 
ple, without the adaptability to environmental changes, main- 
tained by communication through signaling pathways, pertur- 
bations would be fatal for living cells. Similarly, human soci- 
ety needs to maintain global cooperativity in order to be func- 
tional. No parts of such complex systems are complete, but 
all parts are in contact with each other through a network of 
distributed communication. The speed and reliability of the 
information transfer is closely linked to the network architec- 
ture LLii A This interdependence can be character- 
ized in terms of information measures and Shannon entropies 
fT]. That is, we measure the number of bits of information 
required to transmit a message to a specific remote part of 
the network (Fig. la), or reversely, to predict from where a 
message is received (Fig. 1, b and c). We will thus represent 
information measures related to the network capacity for spe- 
cific communication. The introduced measures are not to be 
confused by the Shannon entropies that have earlier been as- 
signed to the network degree distribution [8J, respectively to 
the long time amplification of the dominant eigenvector of the 
network adjacency matrix j^l- 

In practice, imagine that you at node / want to send a mes- 
sage to node b in a given network (Fig. la). This could for 
example correspond to sending an E-mail over the Internet. 
For simplicity we assume that the message follow the shortest 
path, or if there are several degenerate shortest paths, it is sent 
along one of them. For each shortest path we calculate the 
probability to follow this path. Fig. la, if one without infor- 
mation would chose any new direction with equal probability: 
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with j counting all nodes on the path from a node / to until 
the last node before the target node b is reached. The factor 
kj - 1 instead of kj takes into account the information we gain 
by following the path, and therefore reduce the number of exit 
links by one. The total information needed to identify one of 
all the degenerate paths between / and b defines the "search 
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Search information: Target entropy: Road entropy: 

S(i ^ b)=-log2ZP{p{i,h)} T = -LjC.jlog^c.j R, = -Ijb.jlog^b.j 

P(i.b) 

FIG. 1: Information measures on network topology: (a) Search in- 
formation S (i — > b) measures your ability to locate node b from node 
i. (b) Target entropy Tj measures predictability of tralfic to you lo- 
cated at node /, and (c) Road entropy Rj measures predictability of 
traffic around /. S (i — » b) is the number of yes/no questions needed 
to locate any of the shortest paths between node i and node b. For 
each such path /"(/?(;, fe) I = j- Hj ^--f. with 7 counting nodes on 
the path p(i, b) until the last node before b. c,j is the fraction of the 
messages targeted to ; that passed through neighbor node j. bjj is the 
fraction of messages that go through node ; which also go through 
neighbor node j. 
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where the sum runs over all degenerate paths that connect / 
with b. A large S (i b) means that one needs many yes/no 
questions to locate b. The existence of many degenerate paths 
will be reflected in a small S and consequently in easy goal 
finding. 

The practical question is thus: Which position provides best 
access to the entire network? Surfing the Web, which web- 
page should be the start page when easy access to any other 
page is essential? The answer is the node with minimal access 
information, yi, = 2^ 5'(/ — > b). The networks in Fig. 2, a to 
c, are color coded according to Fig. 2b illustrates that 
hubs, and often nodes directly connected to hubs, give best 
access to the system. Overall one can see that it is easy to 
access other nodes in the network in Fig. 2a, whereas it is 
much more difficult in Fig. 2c. In fact the network in Fig. 2b 





FIG. 2: Hide and seek in complex networks. In (a-c) we sliow two networks obtained by (a) minimizing and (c) maximizing 5 , while keeping 
the degree distribution identical to the Canadian hardwired Internet in (b). This network was selected as a typical communication network 
[l 1, 12, 13], with a broad degree distribution P(> k) ~ k^' ^. The color of each node / shows the value = 2* b), that measures how 

easy it is to find other nodes when starting at node In (d-f) we show same networks, but color coded according to how difficult it is to find 
the nodes, Hi, = — > fe). 



is the Canadian Internet i2oll . whereas the networks in Fig. 
2, a and c, are obtained by rewiring the Canadian network 
to, respectively, minimize and maximize S - Y^i^ilN while 
maintaining the network connected and conserving the degree 
of all nodes Il4ll . is the number of nodes in the connected 
network. 

Naturally, the next question is: Where it is best to hide? 
That is where = 2; '^(' ~* ^) is maximal. Note that max- 
imizing everyone's ability to hide Y^b "^b - Z/ - S ■ N \s 
equivalent to maximizing the search information and therefore 
minimizing everybody's ability to search. Thus we illustrate 
the value of 'Hb in Fig. 2, d to f, for the same networks as 
in Fig. 2, a to c. In agreement with intuition we indeed find 
that hubs are easily accessible by other nodes and thus are bad 
places for hiding. Rather one should hide on nodes on the 
periphery. Is it possible for a node to have a good access to 
other nodes but not be easy accessible at the same time? The 
compromise favors a position on a neighbor to a hub. For ex- 
ample, if we consider the network implementation of a city 
with roads as nodes and intersections as links, it is preferable 
with an address on a small road that connects directly to a 
major road/hub. 

We will later see that many real world networks are char- 
acterized by relatively high value of the overall search infor- 
mation S (Fig. 4), implying that global search abilities are 



limited by functional, geographical or other constraints. The 
ability to search/hide is however not the only measure of the 
communication properties of a network. Another key aspect 
of communication handling is associated to prediction of local 
traffic to and across nodes in the network. This represents the 
"passive" aspect of information handling. 

To define the predictability, let us consider messages arriv- 
ing to a given node / in a network. Your task, being on node 
/, is to guess the "active" neighbor/link from where the next 
message arrives. Without prior knowledge, all your local con- 
nections are equal and it would take you log2(fe) yes/no ques- 
tions to guess the active link, where k is the number of con- 
nections of your node. However, if the information about the 
traffic through links is available, the direction of the next mes- 
sage can be guessed with less questions if the search is biased 
towards the more used links. For simplicity we assume that 
all communication takes place through the shortest paths and 
all nodes communicate in equal amounts with all other nodes. 

The predictability, or alternatively the order/disorder of the 
traffic around a given node /, is measured by an entropy of 
messages that are targeted to a given node /, T,, and an entropy 
of all messages across the node, (Fig. 1, b and c). The 
predictability based on the orders that are targeted to a given 



FIG. 3: Prediction of local communication. The upper panel shows networks obtained by (a) minimizing and (c) maximizing the target entropy 
T - Y^jTilN associated to traffic to nodes in the networks, while keeping the degree distribution identical fW.'T?] to the original network of 
Autonomous Systems in Canada shown in (b). (d to f) show the networks that (d) minimize and (f) maximize the road entropy R = Yiil^il^ 
associated to traffic across nodes in the networks. In (a to c) the nodes are color coded according to the value of T,, while we in (d to t) color 
code according to . 



node i is 

Ti = -^Cijlog2{cij), (3) 

where j - 1,2..., kj denotes the links from node / to its im- 
mediate neighbors j and Cjj is the fraction of the messages 
targeted to i that passed through node j. Similarly we use bij, 
defined as the fraction of messages that go through node / that 
also go through node j, to quantify the entropy associated to 
traffic across node j: 

ki 

Ri = -^jbijlogiibij), (4) 

>=i 

Technically bij is proportional to the betweenness fl^ of the 
link between / and /', whereas c,j rather quantifies a sub- 
division of the network around node /. We will refer to T, 
as the target entropy, and to Ri as the road entropy, where a 
large Tj or Ri mean a low predictability. 

Fig. 3 shows the values of T, and Ri for different complex 
networks. In Fig. 3, a to c, we examine networks by color 
coding the nodes according to target entropy, T,. Fig. 3, d to 
f, show networks color coded according to the road entropy 
Ri. The bluish hubs reflect that traffic to highly connected 



nodes is hard to predict. However, this is not always the case: 
the location of nodes with low predictability also depends on 
the overall topology of the network. The networks in Fig. 3 
are presented so that the entropy increases from, respectively, 
a and d to c and f. As the networks get more disorganized, 
the number of hubs with disordered traffic increases. Also, 
nodes of low degree become more confused as they tend to 
position themselves between the hubs. It is interesting that 
this positioning of low degree nodes increases the number of 
alternative pathways in the system, and thus tend to minimize 
the search information S . Therefore the minimal S network 
in Fig. 2a is similar to the maximal R - 2/^/ or T = 2, T, 
networks in Fig. 3, c and f. 

Whereas the maximal T and R networks are topologically 
similar, this is not at all the case for the minimal T and R net- 
works in Fig. 3, a and d. The network of minimal T in Fig. 
3a concentrates all signaling into a simple star like structure 
with hierarchical features fl6ll. As a consequence nearly ev- 
erybody can easily predict from where the next message will 
come. In contrast, minimizing R results in a topology char- 
acterized by hubs on a string forming an "information super 
highway" (Fig. 3d). Thus a low road entropy R means that 
relatively many links are important, whereas a large R implies 
that few links are essential. In this sense R is related to ro- 
bustness in an intentional edge attack Id7i] whereas T reflects 
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FIG. 4: Measure of relative order in communication networks. A 
high Z-score implies relatively high entropy. In all cases we show 
Z = (/ - /r)/<T, for the information measures I = S , T and R, by 
comparing with Ir for randomized networks with preserved degree 
distribution, cr,. is the standard deviation of the corresponding 
sampled over 100 realizations. Results within the shaded area of 
two standard deviations are insignificant. All networks have a rel- 
atively high search information S . The two human interaction net- 
works CEO 1 18] and scientific collaborations 1 19] show a distinct 
communication structure characterized by local predictability, low T 
and R, and global inefficiency, high 5 . 



robustness in an intentional node attack fl?']. 

We apply our information measures to characterize real net- 
works in Fig. 4, by comparing a number of networks with their 
randomized counterparts 1 14|. The datails of the comparison 
is shown in table 1 . For each network we show the Z-score 
for S , T and R. A large positive Z-score means that the cor- 
responding network has relatively large entropy. For exam- 
ple we see that the hardwired Internet is quite "messy" in all 
senses: The traffic is unpredictable, implying that the network 
is robust, and at the same time one needs relatively large in- 
formation handling to transmit packages across the system. In 
contrast the social networks, exemplified Jhere by the network 
of company executives in USA, CEO |18] and the scientific 
collaboration network, hep-th 1 19|, show a pronounced pat- 
tern of high traffic predictability and large cost of locating any 
particular node. These features are characteristic to the or- 
dered network topologies in Fig. 3a and d. 

In Fig. 4 we also investigate networks of p hys i cal interac- 
tions among proteins in two organisms, yeast |21, 22] and fly 
1E3I1 . Whereas the fly network is quite close to its randomized 
counterpart, yeast is reminiscent of the social networks. The 
large S for yeast reflects that many of the largest hubs are po- 
sitioned on the periphery of the network 1 14], and therefore 
have relatively large entropy Jlj, see Fig. 5. This tendency of 
hub separation reflects optimization of local communication, 
at the cost of global specific signaling. On the other hand the 
protein network of the multicellular and more advanced fly, 
Drosophilia melanogaster, displays a more complicated and 
in fact more robust topology as witnessed by the significantly 



TABLE I: Measure of order in communication networks. Five net- 
works together with their size N and the three information-entropy 
measures I = S ,T and R. In each case we compare the measured I- 
value by comparing with /,. for randomized networks with preserved 
degree distribution. CTr is the standard deviation of the corresponding 
/,. sampled over 100 realizations. For all networks we only consider 
the largest connected component, that is also maintained during the 
randomization. The Internet network is the hardwired Internet of 
autonomous systems |20]. The CEO network is chief executive of- 
ficers connected by links when they sit in the same board of direc- 
tors 1 18], hep-th is a network of scientists connected by links if they 
coauthor a publication yeast is the protein-protein interaction 
network in Saccharomyces Cerevisiae detected by the two-hybrid 
experiment|21], and fly refers to the similar network in Drosophilia 
melanogaster |23]. Both of these networks are pruned to include 
only interactions of high confidence, and in both networks we com- 
pare with their random counterparts where both bait and prey con- 
nectivity of all proteins are preserved. The results on the network of 
is reproduced when considering the core of the yeast network 
measured by |22]. Furthermore, all results are robust to a 10% ran- 
dom removal of links except for the fly network which with such a 
pruning tends to be closer to the yeast network. 

Network N S T R 



Internet 


6474 


16.34 


0.583 


0.809 


randomized 




15.03(2) 


0.499(3) 


0.793(3) 


CEO 


6193 


20.693 


1.58 


1.831 


randomized 




12.597(3) 


3.316(3) 


3.513(1) 


hep-th 


5835 


19.72 


0.847 


1.211 


randomized 




13.48(1) 


1.385(5) 


1.668(1) 


Yeast [2r] 


921 


13.3 


0.38 


0.722 


randomized 




12.5(1) 


0.38(1) 


0.742(3) 


Yeast f22] 


417 


12.2 


0.30 


0.662 


randomized 




10.7(2) 


0.33(2) 


0.708(6) 


Fly 


2915 


14.03 


0.56 


0.931 


randomized 




13.96(6) 


0.53(1) 


0.925(2) 



positive Z-scores for T and R entropies. 

Networks are inherently coupled to communication and in- 
deed their topology reflects this. The optimal topology for in- 
formation transfer relies on a system-specific balance between 
effective communication (search) and not having the individ- 
ual parts being unnecessarily disturbed (hide). We have pre- 
sented measures that quantify the ease of global search, S, 
and the predictability of local activity, T and R, and illustrated 
how they characterize the organization of complex networks. 

In particular the network of corporate CEOs and scientific 
co-authorship, were found to be highly "predictable", and at 
the same time very inefficient in transmitting information. 
In contrast the hardwired Internet was found to be locally 
unpredictable, and therefore robust against local failures. 
Finally the fruit fly, Drosophilia melanogaster, has a more 
robust protein network than yeast, Saccromyces cerevisiae, 
with better connections between distant parts of the network. 
This global communication optimization may reflect that the 
multicellular organism must sustain life in cells with many 
more different local environments than the single-celled 
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FIG. 5: Analysis of the protein-protein interaction network in yeast: 
(a) sliows the core of the yeast protein-protein interaction network 
color coded according to The core data is the reliable subset of 
the two hybrid data set of Ito et al. |22], obtained by selecting the 
largest connected component of the network with only interactions 
with at least 3 1ST included. The value of is increasing from yel- 
low through green and blue to red which mark nodes that have least 
access to the rest of the network. Proteins colored in red are pro- 
teins with "specific signaling pattern", (b) shows an example of a 
randomized version of the network in (a), also color coded with the 
corresponding values. The network is randomized such that the 
number of links for each node (protein) is maintained, while interac- 
tion partners are reshuffled in the same way as in the analysis leading 
to Fig. 4. The access information are substantially lower than in (a), 
reflecting a topology with better global access at the cost of higher 
possibility to be disturbed. 

yeast. 
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